Adaptive Audio-Visual Speech Recognition with Distorted Audio and Video Data
نویسندگان
چکیده
Martin Heckmann , Frédéric Berthommier , Christophe Savariaux , Kristian Kroschel 3 1 Honda Research Institute Europe, 63073 Offenbach, Germany, Email [email protected] 2 Institut de la Communication Parlée (ICP), 38031 Grenoble, France, Email: {bertho, savario}@icp.inpg.fr 3 Institut für Nachrichtentechnik, Universität Karlsruhe, 76128 Karlsruhe, Germany, Email: [email protected]
منابع مشابه
Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier
The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion acros...
متن کاملAdaptive Audio-visual Speech Recognition in the Presence of Audio and Video Distortions
Audio-visual speech recognition leads to significant improvements compared to pure audio recognition especially when the audio signal is corrupted by noise. In this article we investigate the consequences of additional degradations in the video signal on the audio-visual recognition process.. We degrade the images with noise, a JPEG compression, and errors in the localization of the mouth regio...
متن کاملAudiovisual speech recognition with missing or unreliable data
In order to robustly recognize distorted speech, use of visual information has been proven valuable in many recent investigations. However, visual features may not always be available, and they can be unreliable in unfavorable recording conditions. The same is true for distorted audio information, where noise and interference can corrupt some of the acoustic speech features used for recognition...
متن کاملOpen-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Automatic speech recognition (ASR) on video data naturally has access to two modalities: audio and video. In previous work, audio-visual ASR, which leverages visual features to help ASR, has been explored on restricted domains of videos. This paper aims to extend this idea to open-domain videos, for example videos uploaded to YouTube. We achieve this by adopting a unified deep learning approach...
متن کاملCharacteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition
This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). Described methods where developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audio-visual sp...
متن کامل